Inter-frame predicted image synthesizing method

ABSTRACT

A method and apparatus for simplifying an operation for the processing of a warping prediction of dividing an image into patches and deforming each patch by affine transform or bilinear transform. Motion vectors of a plurality of representative points in which a spatial interval has a special feature are obtained from at least one patch formed with a plurality of grid points. Information of the motion vectors are used for synthesis of a predicted image. The division for synthesizing a predicted image in the case of warping prediction is replaced with a shift operation, thereby simplifying the processing by a computer or exclusive hardware.

CROSS REFERENCE TO RELATED APPLICATION

The present application is a continuation of application Ser. No.11/474,966, filed Jun. 27, 2006, which is a continuation of applicationSer. No. 11/002,110, filed Dec. 3, 2004, now U.S. Pat. No. 7,139,314;which is a continuation of application Ser. No. 10/173,776, filed Jun.19, 2002, now U.S. Pat. No. 6,961,380; which is a continuation ofapplication Serial No. Ser. No. 08/933,377, filed Sep. 19, 1997, nowU.S. Pat. No. 6,526,095 and is related to application Ser. No.08/516,218 filed Aug. 17, 1995, now U.S. Pat. No. 5,684,538, entitled“SYSTEM AND METHOD FOR PERFORMING VIDEO CODING/DECODING USING MOTIONCOMPENSATION” and application Ser. No. 08/819,628, filed Mar. 17, 1997,now U.S. Pat. No. 6,008,852, entitled “VIDEO CODER WITH GLOBAL MOTIONCOMPENSATION”, the disclosures of which are incorporated herein byreference.

BACKGROUND OF THE INVENTION

The present invention relates to a method and apparatus for coding anddecoding an image. More particularly the present invention relates to amethod and apparatus of coding and decoding an image by the use of amotion compensation method wherein an image is divided, into patches andeach patch is deformed by affine or bilinear transformation.

In the case of high-efficiency coding of a dynamic image by a coder, itis known that motion compensation is very effective for data compressiondue to similarity between frames temporally close to each other. Theprocessing of an image by motion compensation is performed according toequation 1, shown below. In this case, it is assumed that the predictedimage of a frame (current frame) to be coded is P(x,y) and a referenceimage (decoded image of a frame which is temporally close to P and whosecoding is already completed) is R(x,y). Moreover, it is assumed that xand y are integers and a pixel is present at a point whose coordinatevalues are integers for P and R. In this case, the relation between Pand R is also shown by the equation 1 shown below.

P(x,y)=R(f _(i)(x,y),g _(i)(x,y)),(x,y)εPi,0≦i<N  Equation 1

In this case, Pi denotes a pixel included in the i-th patch of an imageby assuming that the image is divided into N small regions (patches).Moreover, transformation functions fi(x,y) and gi(x,y) show the spatialcorrespondence between the image of a current frame and a referenceimage. In this case, the motion vector of the pixel (x,y) in Pi can beshown by (fi(x,y)−x, gi(x,y)−y) by using the coordinates of a pixel in apredicted image as a starting point and the coordinates of acorrespondent point in the reference image as an ending point.

In the case of H.261, of Motion Picture Experts Group (MPEG)1, and MPEG2which are the international standards for the video coding method, amethod referred to as block matching is used in which fi(x,y)−x andgi(x,y)−y are constants unrelated to x or y. However, to achieve a datacompression ratio higher than these standard coding methods, it isrequired to use a higher-level motion compensation method. As the abovenew motion compensation method, a motion compensation method hasrecently been proposed which allows that fi(x,y)−x and gi(x,y)−y are notconstants but pixels in the same patch have different motion vectors. Astransformation functions of these methods, the following examples havebeen disclosed.

“Basic study of motion compensation according to triangle patch”, byNakaya et al., Technical report of IEICE, IE90-106, Hei 2-03 disclosesan example of affine transform as follows:

fi(x,y)=a _(i0) x+a _(i1) y+a _(i2)

gi(x,y)=a _(i3) x+a _(i4) y+a _(i5)  Equation 2

“Motion compensation for video compression control grid interpolation”,by G.J. Sullivan et al., Proc. ICASSP '91, M9.1, pp. 2713-2716, 1991-05discloses an example of bilinear transform as follows:

fi(x,y)=b _(i0) xy+b _(i1) x+b _(i2) y+b _(i3)

gi(x,y)=b _(i4) xy+b _(i5) x+b _(i6) y+b _(i7)  Equation 3

In the above equations, aij and bij denote motion parameters estimatedfor each patch. When the value of a transformation function is not aninteger, coordinate values are not integers. Therefore, it is necessaryto obtain the luminance value of a point where no pixel is actuallypresent in a reference image. In this case, bilinear interpolation usingfour peripheral pixels is frequently performed. When describing theabove interpolation method in the form of the equation, R(x+ξ, y+η) isshown below by assuming 0≦ξ,η<1.

R(x+ξ,y+η)=(1−η)((1−ξ)R(x,y)+ξR(x+1,y))+η(1−ξ)R(x,y+1)+ξR(x+1,y+1))  Equation4

Hereinafter, the motion compensation method using the transformationfunctions in the above equations 2 and 3 is referred to as warpingprediction.

To transmit motion information, a video coder must transmit informationcapable of specifying the motion parameter of a transformation functionto the receiving side in some manner. For example, the transformationfunction uses affine transform and the shape of a patch is assumed as atriangle. In this case, even if six motion parameters are directlytransmitted to or the motion vectors of three apexes of the patch aretransmitted to the receiving side, it must be possible for the receivingside to reproduce the six motion parameters ai1 to ai6.

FIGS. 1 a-d illustrate an example of the motion compensation method oftransmitting motion vectors of apexes (grid points) of a triangle patch.FIGS. 1 a-d illustrate the processing for synthesizing a predicted imageof an original image 102 of a current frame by using a reference image101. First, the current frame is divided into a plurality of polygonalpatches and formed into a patch-divided image 108 as illustrated in FIG.1 d. An apex of a patch is referred to as a grid point and each gridpoint is shared by a plurality of patches. For example, a patch 109 isformed with grid points 110, 111, and 112 and these grid points alsoserve as apexes of another patch. Thus, after an image is divided into aplurality of patches, motion estimation is performed.

In the case of the example shown in FIGS. 1 a-b, motion estimation isperformed between the predicted image and the reference image for eachgrid point. As a result, each patch is deformed by a reference image 103after motion estimation as illustrated in FIG. 1 c. For example, thepatch 109 corresponds to a deformed patch 104. This is because it isestimated that grid points 105, 106, and 107 respectively correspond tothe grid points 110, 111, and 112. In the case of this example, whenassuming the coordinates of the grid points 110, 111, and 112 as (I,J),(I+r,J), and (I,J+s) (where I, J, r, and s are integers) respectivelyand the motion vectors of the points as (U1,V1), (U2,V2), and (U3,V3)respectively, the motion vector (ua(x,y),va(x,y)) at the point (x,y) inthe patch can be shown by the following equation 5 in accordance withthe relation of the equation 2.

$\begin{matrix}{{{u_{a}\left( {x,y} \right)} = {{\frac{U_{1} - U_{0}}{r}\left( {x - I} \right)} + {\frac{U_{2} - U_{0}}{s}\left( {y - J} \right)} + U_{0}}}{{v_{a}\left( {x,y} \right)} = {{\frac{V_{1} - V_{0}}{r}\left( {x - I} \right)} + {\frac{V_{2} - V_{0}}{s}\left( {y - J} \right)} + V_{0}}}} & {{Equation}\mspace{14mu} 5}\end{matrix}$

By using the relation, it is possible to obtain the motion vector foreach pixel and synthesize a predicted image.

Moreover, when deforming a quadrangular patch by using bilineartransform, the motion vector ub(x,y),vb(x,y) at a point (x,y) in thepatch can be shown by the following equation in accordance with therelation of the equation 3 by assuming the coordinates of grid points as(I,J), (I+r,J), (I,J+s), and (I+r,J+s) and the motion vectors of thepoints as (U1,V1), (U2,V2), (U3,V3), and (U4,V4).

$\begin{matrix}\begin{matrix}{{{ub}\left( {x,y} \right)} = {{\frac{J + s - y}{s}\left( {{\frac{I + r - x}{r}U_{0}} + {\frac{x - I}{r}U_{1}}} \right)} +}} \\{{\frac{y - J}{s}\left( {{\frac{I + r - x}{r}U_{2}} + {\frac{x - I}{r}U_{3}}} \right)}} \\{= {{\frac{U_{0} - U_{1} - U_{2} + U_{3}}{rs}\left( {x - I} \right)\left( {y - J} \right)} +}} \\{{{\frac{{- U_{0}} + U_{1}}{r}\left( {x - I} \right)} + {\frac{{- U_{0}} + U_{2}}{s}\left( {y - J} \right)} + U_{0}}} \\{{{vb}\left( {x,y} \right)} = {{\frac{V_{0} - V_{1} - V_{2} + V_{3}}{rs}\left( {x - I} \right)\left( {y - J} \right)} +}} \\{{{\frac{{- V_{0}} + V_{1}}{r}\left( {x - I} \right)} + {\frac{{- V_{0}} + V_{2}}{s}\left( {y - J} \right)} + V_{0}}}\end{matrix} & {{Equation}\mspace{14mu} 6}\end{matrix}$

By introducing the above-described warping prediction, it is possible toaccurately approximate the motion vectors of an image sequence andrealize a high data compression ratio. However, the throughput forcoding and decoding decreases compared to a conventional method.Particularly, the divisions performed in the equations 5 and 6 are largefactors for complicating the processing. Therefore, the warpingprediction using affine transform and bilinear transform has a problemthat the throughput for synthesizing a predicted image decreases.

SUMMARY OF THE INVENTION

It is an object of the present invention to decrease the number ofoperations performed in an image coding process by replacing thedivision processing in the warping prediction with a binary shiftoperation.

The division processing can be realized by a shift operation byobtaining and using the motion vector of an assumed grid point(representative point).

The present invention provides a method, apparatus and computer programfor synthesizing an inter-frame predicted image using a motioncompensation process of dividing an image into N patches, wherein N is apositive integer, and deforming the patches through affine transform.Pixel sampling of an image is performed at an interval of 1 in thehorizontal and vertical directions and a sampling point is present on apoint in which the horizontal and vertical components of coordinates areintegers.

The inter-frame predicted image is synthesized by obtaining motionvectors of three representative points in which coordinates arerepresented by (I′, J′), (I′+p, J′), and (I′, J′+q) in at least in onepatch Pa, and computing a motion vector of each pixel in the patch usingthe motion vectors of the three representative points. Either of p and-p is equal to 2^(α), where α is a positive integer, in the patch Pa,and either of q and −q is equal to 2^(β), where β is a positive integer,in the patch Pa. A triangle formed by said three representative pointsin the patch Pa does not coincide with the shape of the patch.

The present invention also provides a method apparatus and computerprogram for synthesizing an inter-frame predicted image using a motioncompensation process of dividing an image into N patches, wherein N is apositive integer, and deforming the patches through bilinear transform.Pixel sampling of an image is performed at an interval of I in thehorizontal and vertical directions and a sampling point is present on apoint in which the horizontal and vertical components of coordinates areintegers.

The inter-frame predicted image is synthesized by obtaining motionvectors of four representative points in which coordinates arerepresented by (I′,J′), (I′+p,J′), (I′,J′+q), and (I′+p,J′+q) in atleast one patch Pb and computing a motion vector of each pixel in the atleast one patch using the motion vectors of the four representativepoints. Either of p and -p is equal to 2^(α), where α is a positiveinteger, in the at least one patch Pb and either of q and −q is equal to2^(β), where β is a positive integer, in the at least one patch Pb. Arectangle formed by the four representative points in the at least onepatch Pb does not coincide with the shape of the patch.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be more apparent from the following detaileddescription, when taken in conjunction with the accompanying drawings,in which:

FIGS. 1 a-d illustrate a reference image and an original image for theprocessing of warping prediction according to affine transform;

FIG. 2 illustrates a patch, grid point, and representative point for thewarping prediction according to affine transform;

FIG. 3 illustrates a patch, grid point, and representative point for thewarping prediction according to bilinear transform;

FIG. 4 illustrates a flow chart for performing video coding according toan embodiment of the invention;

FIG. 5 illustrates a flow chart for video decoding according to anembodiment of the invention;

FIG. 6 is a diagram of a software encoder for a video coding method isaccording to an embodiment of the invention;

FIG. 7 is a diagram of a software decoder for a video decoding methodaccording to the present invention;

FIG. 8 is an overall diagram of a video encoder of the presentinvention; and

FIG. 9 is a diagram of a motion compensation unit used in the encoder ofFIG. 8, according to one embodiment of the invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

In the following, it is assumed that a pixel sampling interval is equalto 1 in both horizontal and vertical directions and a pixel is presentat a point in which the horizontal and vertical components ofcoordinates are integers. Moreover, the present invention is realized byapplying the invention related to a method for accelerating an operationfor global motion compensation to warping prediction as disclosed inJapanese Patent Application No. 060572/1996.

When performing the warping prediction using affine transform orbilinear transform, it is possible to obtain the advantages thatmismatching is prevented and operations are simplified by quantizingothe motion vector for each pixel as disclosed in Japanese PatentApplication Nos. 193970/1994, 278629/1995. It is hereafter assumed thatthe horizontal and vertical components of the motion vector of a pixelare quantized to values integral multiples of 1/m, wherein m is apositive integer. Moreover, it is assumed that the warping predictionfor transmitting the motion vectors of grid points as described above isperformed and the horizontal and vertical components of the motionvector of each grid point are quantized to values integral multiples of1/n, wherein n is a positive integer, and then transmitted.

In this case, when it is assumed that (u00,v00), (u01 ,v01), and(u02,v02), wherein u00, v00, u01, v01, u02, and v02 are integers, areobtained by multiplying the horizontal and vertical components of themotion vectors of grid points 202, 203, and 204 located at the apexes(I,J), (I+r,J), and (I,J+s), wherein I, J, r, and S are integers of thepatch 201 illustrated in FIG. 2 by n respectively, (u(x,y),v(x,y)),wherein u(x,y) and v(x,y) are integers, obtained by multiplying thehorizontal and vertical components of the motion vector of a pixel inthe patch formed by the above grid points by m can be shown by thefollowing equation 7 in accordance with equation 5.

u(x,y)=((u ₀₁ −u ₀₀)(x−I)s+(u ₀₂ −u ₀₀)

(y−J)r+u₀₀rs)m//(rsn)

v(x,y)=((v ₀₁ −v ₀₀)(x−I)s+(v ₀₂ −v ₀₀)

(y−J)r+v₀₀rs)m//(rsn)  Equation 7

In the above equation 7, the symbol “//” denotes a division for roundingan operation result obtained from a division of a real number into anearby integer when the result is not an integer and its priority as anoperator is equal to that of a multiplication or division. To decreasean operation error, it is preferable that a non-integer value is roundedinto the nearest integer. In this case, the following methods areconsidered as methods for rounding a value obtained by adding ½ to aninteger.

(1) To round the value in the direction toward 0.(2) To round the value in the direction away from 0.(3) To round the value in the direction toward 0 when a dividend isnegative but in the direction away from 0 when the dividend is positivewherein it is assumed that a divisor is always positive.(4) To round the value in the direction away from 0 when a dividend isnegative but in the direction toward 0 when the dividend is positivewherein it is assumed that a divisor is always positive.

Among the above methods (1) to (4), the methods (3) and (4) areadvantageous because of increased throughput as not requiring a decisionas to whether a dividend is positive or negative because the roundingdirection does not change independently of whether the dividend ispositive or negative. For example, a high-speed processing using themethod (3) can be realized in accordance with the following equation 8.

u(x,y)=(Lrsn+((u ₀₁ −u ₀₀)(x−I)s+(u ₀₂ −u ₀₀)

(y−J)r+u₀₀rs)m+(rsn#2))#(rsn)−L

v(x,y)=(Mrsn+((v ₀₁ −u ₀₀)(x−I)s+(v ₀₂ −v ₀₀)

(y−J)r+v₀₀rs)m+(rsn#2))#(rsn)−M  Equation 8

In the above equation 8, the symbol “#” denotes a division betweenpositive integers for dropping the fractional portion in the directionof 0, which can most-easily be realized by a computer. Moreover, in theabove formula, L and M denote large-enough positive integers for keepinga dividend for a division always positive. Furthermore, the term of(r·s·n#2) is used to round a division result into the nearest integer.

By using the equation 8, it is possible to realize the processing of thewarping prediction using affine transform only by integer operations.

Moreover, if the value of r·s·n is equal to the positive integer-thpower of 2, it is possible to replace a division with a binary shiftoperation and greatly simplify the operation. However, to realize thereplacement, all of r, s, and n must be the positive integer-th power of2.

A method for replacing a division with a shift operation is describedbelow even if at least one of r, s, and n is not the positive integer-thpower of 2. By expanding the motion vector field described by theequation 7, assumed grid points (representative points) 205, 206, and207 are arranged at (I′,J′), (I′+p,J′), and (I′, J′+q), wherein 1′, J′,p, and q are integers shown in FIG. 2. In this example, the coordinatesof the grid point 202 coincide with the coordinates of therepresentative point 205, the “grid point 202” and the “representativepoint 205” have a semantics different from each other. An assumed patchis formed by these representative points and motion vectors are computedby using the assumed patch. It is assumed that the horizontal andvertical components of the motion vector of each representative pointare quantized to values integral multiples of 1/k, wherein k is apositive integer. When it is assumed that values obtained by multiplyingthe horizontal and vertical components of the motion vectors of therepresentative points 205, 206, and 207 by k are (u0,v0), (u1,v1), and(u2,v2), wherein u0, v0, u1, v1, u2, and v2 are integers respectively,these values can be shown by the following equations 9 and 10.

u ₀ =u′(I′,J′)

v ₀ =v′(I′,J′)

u ₁ =u′(I′+p,J′)

v ₁ =v′(I′+p,J′)

u ₂ =u′(I′,J′+q)

v ₂ =v′(I′,J′+q)  Equation 9

u′(x,y)=((u ₀₁ −u ₀₀)(x−I)s+(u ₀₂ −u ₀₀)

(y−J)r+u₀₀rs)k///(rsn)

v′(x,y)=((v ₀₁ −v ₀₀)(x−I)s+(v ₀₂ −v ₀₀)

(y−J )r+v₀₀rs)k///(rsn)  Equation 10

In the above equation 10, u′(x,y) and v′(x,y) are integers and symbol“///” denotes a division for rounding an operation result obtained froma division of a real number into a nearby integer when the result is notan integer and its priority as an operator is equal to that of amultiplication or division.

A preferable approximation of the equation 7 can be obtained by usingthe motion vectors of these representative points and thereby, showing(u″(x,y), v″(x,y)), wherein u″(x,y) and v″(x,y) are integers, obtainedby multiplying the horizontal and vertical components of the motionvectors of the pixels in the assumed patch by m as the followingequation 11.

u″(x,y)=((u ₁ −u ₀)(x−I′)q+(u ₂ −u ₀)

(y−J′)p+u₀pq )m//(pqk)

v″(x,y)=((v ₁ −v ₀)(x−I′)q+(v ₂ −v ₀)

(y−J′)p+v₀pq)m//(pqk)  Equation 11

In this case, by setting the value of p, q, and k to the positiveinteger-th power of 2, it is possible to replace the divisions of theequation 11 with shift operations and greatly simplify the operations.

For warping prediction, the size of a patch is an important parameterfor determining a coding characteristic. In general, when decreasing thesize of a patch, the accuracy of motion compensation prediction isimproved but the number of motion parameters increases by a valueequivalent to the decrease of the size and the amount of motioninformation to be transmitted increases. However, when increasing thesize of a patch, the amount of motion information decreases but theprediction characteristic is deteriorated by a value equivalent to theincrease of the size. Therefore, in the case of the above example, thevalues of r and s at which the best coding characteristic is obtainedare not always the positive integer-th power of 2. When r or s is notthe positive integer-th power of 2, it is possible to execute ahigh-speed operation similarly to the case in which r and s are thepositive integer-th power of 2 by applying the above-described method ofusing an assumed patch to the above example.

The method of using an assumed patch can also be applied to the warpingprediction using bilinear transform. In the case of the descriptionbelow, it is assumed that the horizontal and vertical components of themotion vectors of a pixel, grid point, and representative point arequantized to values integral multiples of 1/m, 1/n, and 1/k, wherein m,n, and k are positive integers, respectively, similarly to the abovecase. Moreover, the definitions of symbols “//” and “///” are the sameas the above mentioned. For the patch 301 shown in FIG. 3, when assumingthat values obtained by multiplying the horizontal and verticalcomponents of the motion vectors of grid points 302, 303, 304, and 305located at (I,j), (I+r,J), (I,J+s), and (I,J+s), wherein I, J, r, and sare integers, by n are (u00,v00), (u01,v01), (u02,v02), and (u03,v03),wherein u00, v00, u01, v01, u02, v02, u03, and v03 are integers,respectively, (u(x,y),v(x,y)), wherein u(x,y) and v(x,y)) are integers,obtained by multiplying the horizontal and vertical components of themotion vectors of the pixels in a patch formed with these grid points bym can be shown by the following equation 12.

u(x,y)=((J+s−y)((I+r−x)u ₀₀+(x−I)u ₀₁)+(y−J)((I+r−x)u ₀₂+(x−I)u₀₃))m//(rsn)

v(x,y)=((J+s−y)((I+r−x)v ₀₀+(x−I)v ₀₁)+(y−J)((I+r−x)v ₀₂+(x−I)v₀₃))m//(rsn)  Equation 12

The motion vector field described by the above equation 12 is expandedto arrange representative points 306, 307, 308, and 309 at (I′,J′),(I′+p,J′), (I′,J′+q), and (I′+p,J′+q), wherein I′, J′, p, and q areintegers, as shown in FIG. 3. In this example, the coordinates of thegrid point 302 coincide with the coordinates of the representative point306, the “grid point 302” and the “representative point 306” have adifferent semantics from each other. An assumed patch is formed withthese representative points and motion vectors are computed by using thepatch. When assuming that values obtained by multiplying the horizontaland vertical components of the motion vectors of the representativepoints 306, 307, 308, and 309 by k are (u0,v0), (u1,v1), (u2,v2), and(u3,v3), wherein u0, v0, u1, v1, u2, v2, u3, and v3 are integers,respectively, these values can be shown by the following equations 13and 14.

u ₀ =u′(I′,J′)

v ₀ =v′(I′,J′)

u ₁ =u′(I′+p,J′)

v ₁ =v′(I′+p,J′)

u ₂ =u′(I′,J′+q)

v ₂ =v′(I′,J′+q)

u ₃ =u′(I′+p,J′+q)

v ₃ =v′(I′+p,J′+q)  Equation 13

u′(x,y)=((J+s−y)((I+r−x)u ₀₀+(x−I)u ₀₁)+(y−J)((I+r−x)u ₀₂+(x−I)u₀₃))k//(rsn)

v′(x,y)=((J+s−y)((I+r−x)v ₀₀+(x−I)v ₀₁)+(y−J)((I+r−x)v ₀₂+(x−I)v₀₃))k//(rsn)  Equation 14

In the above equations 13 and 14, u′(x,y) and v′(x,y) are integers andthe definition of symbol “///” is the same as the above.

By using the motion vectors of these representative points and thereby,showing (u″(x,y),v″(x,y)) (u″(x,y) and v″(x,y) are integers obtained bymultiplying the horizontal and vertical components of the motion vectorsof the pixel in the assumed patch by m as the following equation 15, itis possible to obtain a preferable approximation of the equation 12.

u″(x,y)=((J′+q−y)((I′+p−x)u ₀+(x−I′)u ₁)+(y−J′)((I′+p−x)u ₂+(x−I′)u₃))m//(pqk)

v″(x,y)=((J′+q−y)((I′+p−x)v ₀+(x−I′)v ₁)+(y−J′)((I′+p−x)v ₂+(x−I′)v₃))m//(pqk)  Equation 15

In this case, by setting the values of p, q, and k to the positiveinteger-th power of 2, it is possible to replace the divisions in theequation 15 with shift operations and greatly simply the operations.

When performing the warping prediction using an assumed patch, it ispossible to consider the methods for transmitting (i) the motion vectorof a grid point, (ii) the motion vector of a representative point, and(iii) a motion parameter. The method (ii) is advantageous from theviewpoint that the computation for obtaining the motion vector of arepresentative point by a decoder is not performed. However, when onegrid point is shared by a plurality of patches as shown in FIG. 1, themethod (i) is more advantageous. Moreover, when a certain limited range(e.g. a range within ±32 pixels) is provided for the motion vectors ofpixels in a patch, the method (i) has a feature that the motion vectorof a grid point to be transmitted is always kept in the limited range.

This feature is effective when designing the code word of a motionvector to be transmitted or determining the number of digits necessaryfor operations. However, it is necessary to carefully perform theoperations (equations 9, 10, 13, and 14) for obtaining the motion vectorof a representative point from the motion vector of a grid point inaccordance with the method (i). It is necessary to consider that amotion vector obtained from the equation 11 or 15 may go out of a setlimited range because of an error due to rounding into an integer.Particularly, it is necessary to pay attention to the case in which arepresentative point is located inside of a patch. This is because amotion vector can be obtained for a pixel located outside of a triangleor rectangle enclosed by representative points through extrapolation isin the case of the equations 11 or 15 and thereby, the rounding errorwhen obtaining the motion vector of a representative point may beamplified. Therefore, it is preferable that a representative point islocated outside of a patch. However, when increasing the size of anassumed patch, the range of the value of the motion vector of arepresentative point is widened and the number of digits for operationsincreases. Therefore, a larger patch is not always better. As a result,an assumed patch as small as possible is preferable among thoseincluding an original patch.

In the case of the method (iii), for example, the motion parameter aijof the equation 2 is transmitted. In this case, it is possible to obtainthe motion vector of a representative point from the parameter andcompute the motion vectors of the pixels in a patch by using theequation 11. In this case, it is also possible to use a parameterrepresenting an enlargement or reduction rate or a rotation angle inaddition to aij of the equation 2 or bij of the equation 3 as a motionparameter to be transmitted.

The present invention makes it possible to substitute the shiftoperation for the division for synthesizing a predicted image of globalmotion compensation, and to simplify the processing using eithersoftware or dedicated hardware or a combination of both.

FIG. 4 illustrates the steps followed in performing video coding ofvideo image data using local motion compensation according to anembodiment of the present invention. In step 150, a video signal isinput and in step 151, motion estimation for warping prediction isperformed between an input image and the decoded image of a previousframe. Then, the motion vectors are derived from the grid points of theinput image in step 152.

In the next step, step 153, a predicted image of motion compensation issynthesized using a fast algorithm. The fast algorithm is a generalexpression for algorithms disclosed herein, such as the bilinearalgorithm and affine algorithm. For example, equation 2 is an affinealgorithm whereas equation 3 is a bilinear algorithm. Further, equations5, 7, 8 and 10 are affine whereas equations 6 and 12 are bilinear.

Then, in step 157, the error image is synthesized by calculating thedifference between the predicted image and the input image and the errorimage is subject to a discrete cosine transform for quantizing the DCTcoefficients in step 158. Finally, in step 159, the compressed videodata is output.

FIG. 5 illustrates a flow chart of the video decoding according to thepresent invention. In step 160, an input bit stream, such as a H.261 bitstream is received as the compressed video data. The motion vectors ofthe grid points are derived and in step 161 and in step 162, thepredicted image of motion compensation is synthesized. The error imagewith respect to the predicted image is synthesized in step 165 and theerror image is added to the predicted image in 166. In step 167, thereconstructed video signal is output to complete the decoding of theencoded video data.

FIGS. 6 and 7 are block diagrams of the components of the encoder anddecoder of the invention for storing and executing software operating asdisclosed in the flowcharts of FIGS. 4 and 5. The components in commonfor both diagrams have the same reference numbers and include the databus 140, CPU 142 and storage device 143. The encoder program forexecuting the video encoding is illustrated in FIG. 6, and is stored instorage device 143. The decoder program for executing the video decodingis illustrated in FIG. 7, and is stored in storage device 143. Storagedevices 143 are storage media, such as hard disk drives, floppy disks oroptical disks, for example.

With reference to FIG. 6, an input video signal is A/D converted by A/Dconverter 141 and sent to CPU 142 over bus 140. CPU 142 retrieves andexecutes the encoder program 144 stored in storage device 143 and thenencodes and compresses the video data received from the A/D converter141. After the video data is encoded, it is stored in an output buffer145 and output as output data. Control data and timing signals are alsooutput with the compressed video data.

FIG. 7 illustrates the processing of coded video signal, which isreceived at input buffer 148 and then read by CPU 142. CPU 142, whichretrieves the decoder program 147 from the storage device 143, executesthe decoding of the coded video data. The decoded video data is thensent over bus 140 to D/A converter 146 for outputting an analog videosignal.

FIG. 8 illustrates the overall block diagram of a video coder accordingto the invention. More particularly, FIG. 8 illustrates the constructionof a video coder 1001 of the H.261 Standard which employs a hybridcoding system (adaptive interframe/intraframe coding method) which is acombination of block matching and DCT (discrete cosine transform). Asubtractor 102 calculates the difference between an input image(original image of present frame) 101 and an output image 113 (that willbe described later) of an interframe/intraframe switching unit 119, andoutputs an error image 103. The error image is transformed into a DCTcoefficient through a DCT processor 104 and is quantized through aquantizer 105 to obtain a quantized DCT coefficient 106. The quantizedDCT coefficient is output as transfer data onto a communication line andis, at the same time, used in the coder to synthesize an interframepredicted image. A procedure for synthesizing the predicted image willbe described below. The quantized DCT coefficient 106 passes through adequantizer 108 and an inverse DCT processor 109 to form a reconstructederror image 110 (the same as the error image reproduced on the receivingside).

An output image 113 (that will be described later) of theinterframe/intraframe switching unit 119 is added thereto through anadder 111, thereby to obtain a reconstructed image 112 of the presentframe (the same image as the reconstructed image of the present framereproduced on the receiving side). The image is temporarily stored in aframe memory 114 and is delayed in time by one frame. At the presentmoment, therefore, the frame memory 114 is outputting a reconstructedimage 115 of the preceding frame. The reconstructed image 101 of thepresent frame are input to motion estimation and compensation unit 1002.

In the motion estimation and compensation unit 1002, an image is dividedinto a plurality of blocks, and a portion most resembling the originalimage of the present frame is taken out for each of the blocks from thereconstructed image of the preceding frame, thereby synthesizing apredicted image 117 of the present frame. At this moment, it isnecessary to execute a processing (local motion estimation) fordetecting how much the blocks have moved from the preceding frame to thepresent frame. The motion vectors of the blocks detected by the motionestimation are transmitted to the receiving side as motion data 120.From the motion data and the reconstructed image of the preceding frame,the receiving side can synthesize an estimated image which is the sameas the one that is obtained independently on the transmitting side.

Referring again to FIG. 8, the estimated image 117 is input togetherwith a “0” signal 118 to the interframe/intraframe switching unit 119.Upon selecting either of the two inputs, the switching unit switches thecoding to either the interframe coding or the intraframe coding. Whenthe predicted image 117 is selected, the interframe coding is executed.When the “0” signal is selected, on the other hand, the input image isdirectly DCT-coded and is output to the communication line. Therefore,the intraframe coding is executed.

In order to properly obtain the reconstructed image on the receivingside, it becomes necessary to know whether the interframe coding isexecuted or the intraframe coding is executed on the transmitting side.For this purpose, a distinction flag 121 is output to the communicationline. The final H.261 coded bit stream 123 is obtained by multiplexingthe quantized DCT coefficient, motion vector, and interframe/intraframedistinction flag into multiplexed data in a multiplexer 122.

In FIG. 9, a motion estimation and compensation unit 1003 that performswarping prediction using fast synthesis of the predicted image is isshown. Unit 1003 can be used as the motion estimation and compensationunit 1002 of FIG. 8.

As shown in FIG. 9, an input video signal 101 is received by the motionestimation unit 1004. Motion estimation is performed between an inputimage and the decoded image of a previous frame by the motion estimationunit 1004. Unit 1004 also derives the motion vectors of the grid pointsof the input image which is output as motion information 120 to themultiplexer 122. Motion information 120 is also transmitted to thepredicted image synthesizer 1005 which synthesizes the predicted imageof motion compensation using the fast algorithm and outputs thepredicted image of the present frame 117 to the adder 102 forsynthesizing the error image by calculating the difference between thepredicted image and the input image.

While the present invention has been described in detail and pictoriallyin the accompanying drawings it is not limited to such details sincemany changes and modifications recognizable to those of ordinary skillin the art may be made to the invention without departing from thespirit and the scope thereof.

1. A decoder of performing a motion compensation between a current Imageand a reference image for synthesizing an inter-frame predicted imagecomprising: means for dividing said current image into N patches, Nbeing a positive integer; means for arranging an assumed patch Pacorresponding to a patch Pb which is one of the N patches, Pa havingthree assumed grid points as representative points having coordinates(I′,J′), (I′+p,J′), and (I′,J′+q); means for obtaining motion vectors ofthe three representative points; and means for computing a motion vectorof each pixel in the patch Pb by affine transformation using the motionvectors of the three representative points, wherein either of p and −pis equal to 2^(α), α being a positive integer, wherein either of q and−q is equal to 2^(β), β being a positive integer, and wherein thehorizontal and vertical components of motion vectors at representativepoints (I′,J′), (I′+p,J′), and (I′,J′+q) take only values integralmultiples of 1/k, wherein k is equal to 2^(h0) and h⁰ is an integerother than a negative integer, wherein the horizontal and verticalcomponents of the motion vector of each pixel take only values anintegral multiple of I/m, where wherein m is a positive integer, andwherein (u(x,y),v(x,y)), where wherein x, y, u(x,y), and v(x,y) areintegers, obtained by multiplying the horizontal and vertical componentsof the motion vector of a pixel (x,y) in the patch Pb by m, is shownusing (u0, v0), (u1, v1), and (u2, v2), where wherein u0, v0, u1, v1,u2, v2 are integers, obtained by multiplying the horizontal and verticalcomponents of the motion vectors at the representative points (I′,J′),(I′+p,J′), and (I′,J′+q) by k, as peru(x,y)=((u ₀ ·p·q+(u ₁ −u ₀)(x−I′)·q+(u2−u0)(y−J′)·p)m)//(p·q·k), andwherein v(x,y)=((v0·p·q+(v1−v0)(x−I′)·q+(v2−v0)(y−J′)·p)m)//(p·q·k),wherein symbol “//” denotes a division that rounds to a nearby integeran operation result obtained from a division of real numbers when theresult is not an integer and its priority as an operator is equal tothat of a multiplication or division.